Dynamic Subspace Clustering for Very Large High-Dimensional Databases

نویسندگان

  • P. Deepa Shenoy
  • K. G. Srinivasa
  • M. P. Mithun
  • K. R. Venugopal
  • Lalit M. Patnaik
چکیده

Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform incremental cluster analysis only to the updations. We present a incremental clustering algorithm for subspace clustering in very high dimensions, which handles both insertion and deletions of datapoints to the backend databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimens...

متن کامل

Subspace outlier mining in large multimedia databases

Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...

متن کامل

Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets Center for Parallel and Distributed Computing Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets

Clustering techniques are used in database mining for nding interesting patterns in high dimensional data. These are useful in various applications of knowledge discovery in databases. Some challenges in clustering for large data sets in terms of scalability, data distribution, understanding end-results, and sensitivity to input order, have received attention in the recent past. Recent approach...

متن کامل

Multi-view Subspace Clustering for High-dimensional Data

The data today is towards more observations and very high dimensions. Large high-dimensional data are usually sparse and contain many classes/clusters. For example, large text data in the vector space model often contains many classes of documents represented in thousands of terms. It has become a rule rather than the exception that clusters in high-dimensional data occur in subspaces of data, ...

متن کامل

Mining Subspace Clusters: Enhanced Models, Efficient Algorithms and an Objective Evaluation Study

In the knowledge discovery process, clustering is an established technique for grouping objects based on mutual similarity. However, in today’s applications for each object very many attributes are provided in large and high dimensional databases. As multiple concepts described by different attributes are mixed in the same data set, clusters are hidden in subspace projections and do not appear ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003